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The gene encoding the spike glycoprotein (S) of bovine 
enteric coronavirus (BECV) was cloned and its com¬ 
plete sequence of 4092 nucleotides was determined. 
This sequence contained a single long open reading 
frame with a coding capacity of 1363 amino acids ( M r 
150747). The predicted protein had 19 N-glycosylation 
sites. A signal sequence comprising 17 amino acids was 
observed starting from the first methionine residue. A 
potential peptidase cleavage site was located between 
amino acids 763 and 767. These cleavages explain the 
maturation of the primary product of the S gene 


to SI (M t 104692) and S2 ( M r 84175) spike structural 
proteins. Two amphipathic a-helices (amino acids 1007 
to 1077 and 1269 to 1294) which may constitute the 
12 nm stalk of the viral spike were also observed; 
another a-helix (amino acids 1305 to 1335) may be 
involved in the anchorage of the spike in the viral 
membrane. Comparison of this protein sequence to the 
described homologous mouse hepatitis (MHV) strain 
A59 and MHV-JHM S protein sequences led us to 
suggest that MHV-A59 and MHV-JHM S genes could 
be derived from a deletion of the BECV S gene. 


Bovine enteric coronavirus (BECV) was first identified 
by electron microscopy in faecal samples of calves 
suffering from acute enteritis (Mebus et al., 1973a). The 
involvement of BECV in the aetiology of diarrhoeal 
diseases has been suggested in several studies (Mebus et 
al., \913b) Bridger et al., 1978; Gouet et al., 1978). 
During the acute stage of infection, virus particles are 
excreted in large amounts and have been identified 
within brush border cells of the small intestine and in 
differentiated colonic epithelial cells. Although propaga¬ 
tion of BECV is difficult in conventional cell lines 
(Mebus et al., 1973a), it has been grown successfully in 
HRT18 cells (human rectal tumour cell line; Laporte et 
al., 1980). 

As a member of the Coronaviridae family BECV is a 
pleiomorphic, enveloped spherical particle (120 nm in 
diameter) surrounded by a fringe of 20 nm long club- 
shaped spikes. The coronavirus genome is a positive and 
single-stranded capped RNA with a polyadenylated 3' 
end (Siddell et al., 1982; Sturman & Holmes, 1983). The 
structural and non-structural proteins of the virus are 
translated from a 3'-coterminal nested set of mRNAs, 
each having a common 5' leader sequence (Lai et al., 
1984). Only the unique 5'-terminal sequence is translat¬ 
ed; this sequence is absent from the next smaller RNA of 
the set. 


BECV possesses five main structural proteins, i.e. a 
phosphorylated nucleocapsid protein (N; 50K), a trans¬ 
membrane matrix glycoprotein (M; 28K) and three 
peplomer glycoproteins [SI, S2 (S; spike) and haemag- 
glutinin (HA)]. Glycoproteins SI (105K) and S2 (95K) 
are the cleavage products of the S gene-encoded primary 
product (180K; J. F. Vautherot, unpublished results; 
Deregt & Babiuk, 1987). HA ( M r 125K) is split by 
reducing agents into two subunits of equal size with an 
M r of 65K (Laporte & Bobulesco, 1981; King & Brian, 
1982); the neutralizing epitopes are located on SI and 
HA (Vautherot et al., 1984). 

The cloning and sequencing of the gene encoding the S 
protein of BECV, a necessary first step for the 
production of a genetically engineered vaccine against 
BECV, is reported in this paper. 

BECV strain F15 (BECV-F15) was grown in HRT18 
cells; virus and genomic RNA purifications were 
performed as described previously (Cruciere & Laporte, 
1988). The virus genome was used as a template for 
cDNA synthesis. Poly(dC)-tailed RNA-cDNA hetero¬ 
duplexes were inserted into a dG-tailed Tstl-linearized 
pBR322 plasmid. Complementary DNAs were then 
cloned in Escherichia coli RR1; tetracycline-resistant, 
ampicillin-susceptible colonies were then transferred 
onto nitrocellulose filters and lysed in situ (Maniatis et al.. 
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Fig. 1. ( a ) Schematic diagram of part of the BECV genome and location of cDNA clones. A simplified restriction map is 
given; the inserts used as probes to screen the cDNA library are shaded grey; the sequence of the S gene was obtained 
with the clones that are boxed in the figure. ( b ) Alignment and comparison of amino acid sequences of MH V-A59, MHV- 
JHM and BECV S propolypeptides. The percentage homology is given; arrows indicate putative peptidase cleavage sites. 


1982). Their DNAs were hybridized overnight at 42 °C, 
in hybridization buffer (5 x SSC, 50% formamide, 
5 x Denhardt’s solution and 100 pg/ml of sonicated calf 
thymus DNA) containing a 32 P-labelled random primed 
viral insert cDNA (Feinberg & Vogelstein, 1983). 

After washing the filters three times in 0-1 x SSC, 
01% SDS at 60 °C for 20 min, they were dried and 
autoradiographed on X-ray films (Fuji) at -70°C for 
12 h. Mini-lysates were obtained from the positive clones 
(Birnboim & Doly, 1979). After digestion with Pstl, the 
selected inserts were analysed on Southern blots (Mania- 
tis et al., 1982). Northern blots were used for the 
localization of inserts. Total RNA from BECV-infected 
HRT18 cells was extracted by the guanidinium isothio¬ 
cyanate method according to Vaquero et al. (1982) and 
electrophoresed in a denaturing 1% agarose gel (6% 
formaldehyde). Transfer, hybridization and washing 
were performed as described for Southern blot 
experiments. 

The primer used to obtain viral cDNA corresponded 
to the BamHl cleavage site (GG ATCC) which was found 
at the beginning of our study on a cDNA clone at the 5' 
end of the gene encoding the N protein. (In fact this 
restriction site was not located at that place but was 
functioning randomly.) The primer yielded 3500 cDNA 
clones. 


By Northern blot analysis (data not shown), we 
established that the large cDNA insert G7 (2-4 kb) 
covered a part of the S gene (Fig. 1); sequence analysis of 
that clone and comparison with the sequence of the gene 
coding for the E2 protein of coronavirus mouse hepatitis 
virus strain JHM (MHV-JHM) (Schmidt et al., 1987) 
showed that this insert mapped in the middle of the S 
gene. G7 cDNA was used to screen the cDNA library 
and to obtain clones. Clones P G7 8 and P 27 40 were 
used also as probes to identify cDNA clones located at 
the 5' and the 3' ends of the S gene, respectively (Fig. 1). 

Both DNA strands were sequenced on five overlap¬ 
ping cDNA clones: P G7, P G7 8, P G7 8 12, P 27 40 and 
P 33 23 (Fig. la). After it had been established by 
restriction mapping and Southern blot analysis that these 
clones covered the whole length of the S gene, Ml3 
dideoxynucleotide sequencing was carried out according 
to Sanger et al. (1977) by using sonicated cloned 
fragments subcloned into the Sma\ site of the M13 mplO 
vector (Deininger, 1983). Buffer gradient gels and [a- 
35 S]dATP were used according to Biggin et al. (1983). 

Sequence data were analysed and assembled with the 
aid of the program of Queen & Korn (1984), the 
Beckman Microgenie program (March 1985 version, 
Beckman Instruments) adapted for the IBM PC-XT 
microcomputer. The nucleotide sequence obtained 
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(Fig. 2) contains a single long open reading frame 
(ORF) in the mRNA sense which extends from the first 
ATG codon (nucleotides 31 to 33) to nucleotide 4122. 
This 4092 nucleotide sequence contains 28 % A, 16-2 % C, 
19-7% G and 36-2% T, and has a coding capacity for 1363 
amino acids (Fig. 2) with an M r of 150747 and a 
proposed pHI of 5-6. 

The length of the ORF is in the expected size range for 
the glycoprotein S gene sequence. Furthermore, com¬ 
parison with the published S gene sequence of corona- 
virus MHV-A59 (Luytjes et al., 1987) shows homology 
which increases from 61% at the 5' end of the ORF 
(BECV nucleotides 31 to 1513) to 73-8% at the 3' end 
(BECV nucleotides 2711 to 4100). 

Immediately upstream from the first initiation codon 
there is a sequence ATCTAAACAT very similar to the 
conserved intergenic sequences of BECV, MHV-JHM 
and MHV-A59 (Lapps et al., 1987; Cruciere & Laporte, 
1988; Luytjes et al., 1987; Schmidt et al., 1987); it is also 
closely related to the conserved sequence AACTAAAC, 
reported for the transmissible gastroenteritis virus 
(TGEV) (Rasschaert & Laude, 1987). The sequence 
surrounding the translation initiation codon is in a sub- 
optimal environment (Kozak, 1987). A similar situation 
has been observed for the initiation codon of the S gene 
of MHV (Luytjes et al., 1987). 

Comparison of the first 400 nucleotides of the PG7 8 
12 clone (P. Boireau & J. Laporte, unpublished 
observations) with the recently published FIA-encoding 
gene sequence of bovine coronavirus (Parker et al., 1989) 
led us to the conclusion that the S gene is just downstream 
of the HA gene. 

The deduced amino acid sequence of the BECV S 
protein contains 19 potential V-glycosylation sites. 
Assuming a mean M x value of 2100 per carbohydrate 
chain (Hunter et al., 1983), the M r of the mature S 
glycoprotein would be approximately 190600. It appears 
to be hydrophobic over most of its length (35% 
hydrophobic amino acids). 

This protein shares some properties with S proteins 
described for other coronaviruses. After the first meth¬ 
ionine residue, there is a potential signal sequence with a 
hydrophobic core of 13 amino acids and a helix-breaking 
residue, glycine 17 (Watson, 1984). According to the rule 
established for a potential cleavage site (von Heijne, 
1984), a protease could act between amino acid residues 
17 and 18. The consequence would be that the first amino 
acid in the S protein is aspartic acid 18. 

Another potential peptidase cleavage site is located in 
the hydrophilic peak between residues 763 and 769. This 
sequence, Lys-Arg-Arg-Ser-Val-Arg, is collinear with the 
experimentally determined cleavage site of the MHV- 
A59 S protein (Luytjes et al., 1987), and very similar to 
the cleavage site of the infectious bronchitis virus (IBV) 


S protein (Binns et al., 1985). Furthermore, this series of 
basic amino acids resembles the tryptic cleavage sites of 
peptidic prohormones or the F0 protein of Newcastle 
disease virus (MacGinnes & Morrisson, 1986), which are 
processed in the trans-Golgi apparatus. The coronavirus 
S protein uses the same cellular metabolic pathway for 
maturation, and budding of the virus takes place in the 
Golgi apparatus and in the endoplasmic reticulum 
membrane. 

Tryptic cleavage would explain the maturation of the S 
protein of BECV: the primary gene product, P150 (i.e. 
the S polypeptide with an M T of 148967 without the 
signal peptide) is glycosylated in the endoplasmic 
reticulum and in the Golgi apparatus giving rise to a 
glycoprotein of 188867 (gpl90), which is then cleaved 
into the SI (104692) and S2 (84175) structural proteins. 
These results are in agreement with those published by 
Deregt & Babiuk (1987), using immunoelectrophoresis to 
study the biosynthesis of gpl05 and gp95, and with our 
own observations (J. F. Vautherot et al., unpublished 
results). 

Using the approach described by de Groot et al. (1987) 
for the S proteins of feline infectious peritonitis virus, 
MHV and IBV and by Rasschaert & Laude (1987) for the 
S protein of TGEV, we also demonstrated two amphi- 
pathic a-helices for the S protein of BECV; they are 
located between amino acids 1007 to 1077 and between 
amino acids 1269 to 1294. These a-helices may constitute 
the stalk of viral spikes, with a length of approximately 
12 nm. 

Using the method described by Rao & Argos (1986) a 
hydrophobic transmembrane a-helix was also predicted 
between amino acids 1305 and 1335. Its C-terminal 
location, the presence of a potential myristylation site on 
glycine 1333 and comparison with other coronaviruses 
suggest that this helix is involved in the anchorage of the 
spike in the viral membrane. The myristylation site, 
surrounded by eight cysteine residues, would be located 
at the internal face of the viral membrane. Among 
coronaviruses this domain is highly conserved in 
structure, location and length. A stretch of eight amino 
acids, Lys-Trp-Pro-Trp-Tyr-Val-Trp-Leu (1305 to 1312; 
Fig. 2), is found in all coronavirus S protein sequences 
established so far; however its role is unknown. 

Luytjes etal. (1987,1988) have compared S amino acid 
sequences of the two MHV strains A59 and JHM. These 
sequences are very similar but the S protein of JHM was 
found to be shorter. As BECV belongs to the same 
antigenic group as these viruses, the present results 
enlarge upon this comparison. Dot matrix analyses 
(Beckman Microgenie program) of the deduced amino 
acid sequence of the S genes of BECV and MHV-A59 
(data not shown), revealed that there is low homology 
(55%) between amino acids 488 to 593 of BECV and 
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GCTCCATGATGCTTRGACCAT ^XtCTAMCX^ CTTTTTCATACTTTTMTTTCCTTACCAATCCCTCTTCCTGTTATAGGAGATTTAAAGTCTACTACGGrTTCCATTAATGATCTTCAC 120 
M F L 1 L L I5LPMALAVIGDLKCTXVSJNDVD30 


ACCGGtGTtCCTTCTATTAGCACTGATACTCTCGATGTTACTAATGGTTTAGGTACTTATTATGTTTTAGATCGTGTGTATTTAAATACTACGCTGTTGCTTAATGGTTACTACCCTACT 240 
X GVPS I STDTVDVT. NGLGTYYVLDRVY LNXTLLLNGYYPT 70 

TCACGTTCTACATATCGTAATATGGCACTGAAGGCAACtTTACTATTGAGCACACTATGGTTTAAACCACCTrTTCTTTCTGATTTTATTAATGCTATTrrrGCTAAGCTCAAAAATACC 360 
SGSTYRNMALKGTLLLSTLWFKPPFLSDFINCIFAKVKNTllO 

AACGTTATTAAACATGGTCTAATGTATAGTGAGTTrCCTGCTATAACrATAGGTAGTACTrTrGTAAATACATCCTATAGTGrGGTAGTACAACCACATACrACCAATTrAGATAATAAA 480 
KVIKHGVMYSEFPAITICSTFVNTSYSVVVQPHTTNLDNK150 


ttACAAGGTCTCrXAGAGATCTCTGrrTGCCAGTATACTATGTGCGAGTACCCAAATACGATTrGTCATCCTAArTrGGGTAATCGGCGCGTAGAACTATCGCATTGGGATACAGGTGTT 600 
LQGLLEI SVCQYTMCEYPNIICHPNLGNRRVELWHWDTGV 190 

GT2TCCrGTTrArATAAGCGTAACTTCACATATGATGTGAATGCTGATTATTTGTATTTCCATTTTTATCAAGAAGGTGGTACTTTTTATGCATATTTTACAGATACTGGTGTTGTTACT 720 
VSCLYKRNFTYDVNADYLYF HFYQEGGTFYAYFTD TGVVt 230 


AAGTTTCTGTTTAATGTTtATTTAGGCACGGTGCTTTCACATTATTATGTCATGCCTTTGACTTGTAATAGTCCTATGACrTTAGAATATTGGGTTACACCTCTCACTTCTAAACAATAT 840 
KFLFNVYLGTVLSHYYVTMPLTCNSAMTLEYWVTPLTSKQY 270 

TTACTCGCTTTCAATCAAGATCGTCTTAtTTTTAATGCTGTTGATTGTAAGAGTGATrTTATGAGTGAGATTAAGTGTAAAACACTATCTATAGCACCATCTACTGGTGrTTATGAArTA 960 
LLAFNQDGVIFKAVDCKSDFMSEIKCKTLSIAPSTGVYEL 310 

AACGGTTACACTGTTCAGCCAATTGCAGATGTTTACCGACGTATACCTAATCTTCCCGATTGTAATATAGAGGCTTGGCTTAATGATAAGTCTGTGCCCTCTC CATTAAATTGGGAACGT 1080 
NGYTVQP IADVYRRIPNLPDCNIEAWLNDKSVPSPLNWER 330 

AAGACCTTTTCAAATrGTAATTTTAATATGAGCAGCCTGATGTCTTTTATCCAGGCAGACTCATTTACTTGTAATAATATTGATGCTGCTAAGATATATGGTATGTGTTTTTCCAGCATA 1200 
KTFSNCNFNMSSLMSFIQADSFTCNNIDAAKIYGMCFSSI 390 

ACTATAGATAAGtTTGCTATACCCAATGGTAGGAAGGTTGACCTACAATTGGGCAATTTGGGCTATTTGCAGTCTTTTAACTATAGAATTGATACTACTGCTACAAGTTGTCAGTTGTAT 1320 
T I.O K F A I PNG R.KV.D L O L G N L G Y LOS F N YR IDTTATS CQ LY 430 

TATAATTTACCTGCTGCTAATGTTTCTCICAGCAGGTTTAATCCTTCTATTTSCAATAGCACATTTGGTTXTACAGAACAATCTGTTTTTAAGCCTCAACCTGTAGGTGTTTTTACTGAT 1440 
YNLPAANVSVSRFNPSIWNRRFGFTEQSVFKPQPVGVFTD 470 

CATGATGTrGTTTATGCACAACATTGTTTTAAAGCTCCCACAAATTTCTGTCCGTCTAAATTGGATGCGTCTTTGTGTGTAGGTAATCGrCCTGGrATAGArGCrCCTTATAAAAATAGr 1560 
HDVVYAQHCFKAPTNFCPCKLDGSLCVGNCPGIDAGYKNS 510 

GGTATAGGCACTTGTCCTGCAGGCACTAATTATTTAACTTGCCATAATGCTGCCCAATGTAATTGTTTGTGCACTCCAGACCCCATTACATCTAAATCTACAGGGCCTTACAAGTGCCCC 1680 
GIGTCPAGTHYti'SCRHAAOCNCLCTPDPlTSKSTGPYKCP 550 

CAAACTAAATACrTAGrTCGCATAGGTGAGCACrGTTCGGGTCTTGCTATTAAAAGTGATTATTGTCGAGGTAATCCTTGrACTTGCCAACCACAAGCATTTTTGGGTrGGTCTGTTGAC 1800 
QTKYLVGIGEHCSGLAIKSDYCGGNPCTCQPQAFLGWSVD 590 

TCTTGTTTACAAGGCGATAGCTGTAATATTTr?GCTAATTTTATTTTGCATCATC?rAATACTGGTACTACTTCTTCTACrGATTTACAAAAArCAAACACAGACATAATTCTrGGTGTT 1920 
SCLQGDRCNIFANF I LHPVNSGTTCSXDLQKSNTD I ILGV 630 

TGTGTTAATTATGATCTTTATGGTATTACTGCCCAAGGrATTrTTGrTGAGGCrAATGCGACTTATTATAATAGTTGGCASAACCrTTTATATGATTCTAATGGTAATCTCTATGGTTTT 2040 
CVNYDLYGI TGQGI FVEANAXYYNSWQNLLYDSNGNLYGF 670 

AGAGACTACTTAACAAACAGAACTTTTATCATTCGTAGTTGCTATAGCGGTCGTGTTTCAGCGGCCTTTCATGCTAACTCTTCCGAACCAGCACTGCTATTTCGGAATATTAAATGCAAT 2160 
RDYLTNRTFMIRSCYSGRVSAAFHANSSEPALLFRNIKCN 7?0 


TACGTTTTTAATAATACTCTTTCACGACAGCTGCAACCTATTAACTATTTTGATAGCTATCTTGGTTGTGTTGTCAATGCTGATAATAGTACTGCTAGTGCTGTTCAAACATGTGATCTC 2280 

YVFNNTLSRQLQP INYFDSYLGCVVNAD ASAVQTCDL 750 

ACAGTAGGTAGTGGTTACTGTGTGGATTACTCTACAAAAAGACGAAGCGTAAGAGCGATTACCACTGGTTATCGGTTTACTAATTTTGAGCCATTTACTGTTAATTCAGTAAATGATAGT 2400 
TVGSGYCVDYSTKRRSVR^AXTTGYRFTNFEPFTVNSV N D S 790 

TTAGAACCTGTAGGTGGTTTGTATGAAATTCAAATACCTTCAGAATTTACTATAGGTAATATGGAGGAGTTTATTCAAACAAGCTCTCCTAAAGTXACTATTGATTGTTCTGCTTT'TGTC 2520 

LEPVGGLYE IQI P SEFT IGNMEEF IQTSSPKVT IDCSAFV 830 

TGTGGTGATTGTGCAGCATGTAAATCACAGTTGGTTGAATATGGTAGT7TCTGTGACAATATTAATGCTATACTCACAGAAGTAAATGAACTACTTGACACTACACAGTTGCAAGTAGCT 2640 
CGDCAACKSQl*VEYGSFCDNlHAIt>TEVNELLDTTQLQVA 870 

AATAG7T?AATGAA?GGTGTCACTC?TAGCACTAAGCTTAAAGATGGCGTTAATTTCAATGTAGACGACATCAATTTTTCCCCTGTATTAGGTTGTTTAGGAAGCGAATGTAATAAAGTT 2760 
N SLMNGV7LSTK LKDGVHFNVDDI NFS PVLGCLGSECNKV 910 
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TCCAGTAGATCTGCTATAGAGGATTTACTTTTTTCTAAAGTAAAGTTATCTGATGTCGGTTTCGTTGAGGCTTATAATAATTGTACTGGGGGTGCCGAAATCAGGGACCTCATTTGCGTG 2880 
SSRSAIEDLLFSKVKLSDVGFVEAYNNCTGGAEIRDLICV950 


CAAAGTTATAATGGTATCAAAGTGTTGCCTCCACTGCTCTCAGAAAATCAGATCAGTGGATACACTTTGGCTGCTACCTCTGCTAGTCTGTTTCCTCCTTGGTCAGCAGCAGCAGGTGTA 3000 
QSYNGIKVLPPLLSENQISGYTLAATSASLFPPWSAAAGV 990 

CCATTTTATTTAAATGTTCAGTATCGTAT7AATGGGATTGGTGTTACCATGGATGTGCTAAGTCAAAATCAAAAGCTTATTGCTAATGCATTTAACAATGCTCTTGATGCTATTCAGGAA 3120 
PFYLNVQYRINGIGVTMDVLSQNQKLIANAFNNALDAIQE 1030 

GGGTTTGATGCTACCAATTCTGCTTTAGTTAAAATTCAAGCTGTTGTTAATGCAAATGCTGAAGCTCTTAATAACTTATTGCAACAACTCTCTAATAGATTTGGTGCTATAAGTTCTTCT 3240 
GFDATNSALVKIQAVVNANAEALNNLLQQLSNRFGAISSS 1070 

TTACAAGAAATTCTATCCAGACTTGATGCTCTTGAAGCGCAACGTCAGATAGACAGACTTATTAATGGGCGTTTCACCGCTCTTAATGCTTATGTTTCTCAACAGCTTAGTGATTCTACA 3360 
LQEILSRLDALEAQRQIDRLINGRFTALNAYV'SQQLSDST J J JO 

CTAGTAAAATTTAGTGCAGCACAAGCTATGGAGAAGGTTAATGAATGTGTCAAAAGCCAATCATCTAGGATAAATTTTTGTGGTAATGGTAATCATATTATATCATTAGTGCAGAATGCT 3480 
LVKFSAAQAMEK. VNECVKSQ erco INFCGNGNHI I SLVQNA 1150 

CCATATGGTTTGTATTTTATCCATTTTAGCTATGTCCCTACTAAGTATGTCACCGCGAAGGTTAGTCCCGGTCTGTGCATTGCTGGTGATAGAGGTATAGCCCCTAAGAGTGGTTATTTT 3600 
PYGLYFIHFSYVPTKYVTAKVSPGLCIAGDRGIAPKSGYFII90 

GTTAATGTAAATAACACTTGGATGTTCACTGGTAGTGGTTATTACTACCCTGAACCTATAACTGGAAATAATGTTGTTGTTATGAGTACCTGTGCTGTTAATTACACTAAAGCGCCGGAT 3720 

V N V N N T WMFTGSGYYYP EP ITGNNVVVMSTCAVNYTKAPD 1230 

GTAATGCTGAACATTTCAACACCCAACCTCCCTGATTTTAAGGAAGAGTTGGATCAATGGTTTAAAAACCAAACATCAGTGGCACCAGATTTGTCACTTGATTATATAAATGTTACATTC 3840 

V M L NTS TP NLPDFKEELDQWFK NOT SVAPDLSLDYI W V T F 1270 

TTGGACCTACAAGATGAAATGAATAGGTTACAGGAGGCAATAAAAC7TTTAAATCAGAGCTACATCAATCTCAAGGACATTGGTACATATGAGTATTATGTAAAATGGCCTTGGTATGTA 3960 
LDLQDEMNRLQEA IKLL » Q S YINLKD IGTYEYYV ^ ^ | ^ 1310 

TGGCTTTTAATTGGCTTTGCTGGTGTAGCTATGCTTGTTTTACTATTCTTCATATGCTGTTGTACAGGATGTGGGACTAGTTGTTTTAAGAAATGTGGTGGTTGTTGTGATGATTATACT 4080 
• ® M # mm • L F « F T # G 0 # G # ^ S © F K K©G G ©© D D Y T 1350 

GCACACCAGGAGTTAGTAATTAAAACATCACATGACGACTAAGTTCGTCTTTGATTTATTGGCTCCTGACGATATATTACATCCCTTCAATCATGTTAAGTTAATTATTATAAGCCCATT 4200 
GHQELVIKTSHDD 1363 

Fig. 2. Nucleotide sequence of the gene encoding the S protein and predicted amino acid sequence of the S protein. The 
sequence deduced from the cDNA inserts (Fig. 1 a) is shown as positive sense DNA from 5' to 3' ends; the hatched bar 
indicates the proposed N-terminal signal sequence. An arrowhead indicates the potential tryptic cleavage site. A box 
surrounds the conserved intergenic sequence. Potential A-glycosylation sites are underlined (specific for the Asn-X- 
Ser/Thr sequence, with X different from Pro). Dots mark the proposed C-terminal membrane anchoring domain. The last 
eight cysteines are encircled. 


amino acids 481 to 545 of MHV-A59 and apparently 
there is a deletion in the MHV-A59 S protein sequence. 
More details emerged after alignment comparison of the 
two sequences (Fig. 1 b). Amino acids 488 to 534 of 
BECV have no counterpart in MHV-A59. Furthermore, 
amino acids 452 to 593 of the S protein of BECV have no 
counterpart in MHV-JHM (Fig. lb). 

Luytjes et al. (1987, 1988) put forward two hypotheses 
to account for the difference between MHV-A59 and 
MHV-JHM in this domain of the S protein. The first 
hypothesis is that the MHV-JHM genome is deleted with 
respect to a nucleotide sequence corresponding to amino 
acids 453 to 545 of the S protein of MHV-A59; the 
second is that the MHV-A59 genome has acquired 
genomic material by non-homologous recombination. 
The comparisons presented above support the idea of a 
genetic instability in this area of the virus genome which 
would explain the difference in length of the S proteins of 


the three virus strains. Therefore we suggest an evolu¬ 
tionary progression from MHV-JHM to MHV-A59 to 
BECV or in reverse depending upon whether non- 
homologous recombination events or deletions have 
occurred. The observation that the nucleotide sequence 
of the S gene encoding amino acids 470 to 480 (data not 
shown) is highly homologous (81%) between BECV and 
MHV-A59 and does not exist in MHV-JHM leads us to 
propose the occurrence of deletions in this genome 
segment. Our suggestion is supported by the identifica¬ 
tion of a functional HA gene in BECV (Parker et al., 
1989) whereas only a related HA pseudogene was 
identified in MHV-A59 (Luytjes et al., 1988); this HA is 
most likely to occur if the BECV genome is more closely 
related to a possible common ancestor. 

We would like to thank Jean Francois Vautherot, Denis Rasschaert 
and Michel Bremont for helpful discussions. 
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